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Nonlinear Stochastic Receding Horizon Control: Stability, 
Robustness and Monte Carlo Methods for Control Approximation* 

Francesco Bertoli^ Adrian N. Bishop^ 


Abstract 

This work considers the stability of nonlinear stochastic receding horizon control when the optimal controller 
is only computed approximately. A number of general classes of controller approximation error are analysed in¬ 
cluding deterministic and probabiUstic errors and even controller sample and hold errors. In each case, it is shown 
that the controller approximation errors do not accumulate (even over an inhnite time frame) and the process 
converges exponentially fast to a small neighbourhood of the origin. In addition to this analysis, an approximation 
method for receding horizon optimal control is proposed based on Monte Carlo simulation. This method is derived 
via the Feynman-Kac formula which gives a stochastic interpretation for the solution of a ffamilton-Jacobi-Bellman 
equation associated with the true optimal controller. It is shown, and it is a prime motivation for this study, that 
this particular controller approximation method practically stabilises the underlying nonlinear process. 


1 Introduction 

Receding horizon optimal control (RHC) is a strategy for controlling a dynamical system over an (possibly) infinite 
horizon where the control input at any instant is derived by solving a finite horizon optimal control problem over a 
fixed length horizon from that instant forwards. An introduction to RHC can be found in (29l[22l. RHC is a natural 
extension of finite-horizon optimal control and a natural simplification of infinite-horizon optimal control. The 
term model predictive control is often used interchangeably with RHC. 

The contribution of this work is: 

1 We study the stability of continuous-time receding horizon control of nonlinear stochastic systems when the 
optimal controller computation is only approximate. In particular, we consider a number of classes of con¬ 
troller approximation error including deterministic and probabilistic errors. We also consider controller 
sample and hold errors, that arise due to real-time computing limitations etc. 

2a We outline a (Monte Carlo) simulation algorithm for approximating the optimal receding horizon control for 
nonlinear stochastic continuous-time systems. 

2b We connect the controller approximation technique to the stability analysis detailed in this work and show 
that this Monte Carlo simulation method for controller approximation stabilises the process (in a sense to 
be made precise). In particular, the approximation errors do not accumulate nor destabilise the system. 

The analysis of RHC for nonlinear (deterministic) systems started, largely, with the analysis of Mayne et. al. 
(28l[3T). Even in this early work, stability was considered for RHC in the presence of a number of controller ap¬ 
proximation errors. Broad work on this topic in the nonlinear realm is covered in (33l[71[T3l. Much work in this 
area has focused on the incorporation of (deterministic) model uncertainty [31] (6] |25l |24] and/or controller and 
state constraints E]. This latter focus concerning constraints is largely beyond the scope of this study, although we 
comment on possible extensions in our concluding remarks. 

In the stochastic realm, the foundations of optimal control of nonlinear (continuous-time) systems are studied 
in, e.g., (To] |20l |43l [21] [39l . Computational methods for general nonlinear stochastic RHC are given in, e.g., |34| 
|^[T6|[3ni|23l[T4||35][8l|3]. In continuous-time cases, optimal nonlinear RHC has been shown (43 to be stabilising 
under the assumption that an optimal controller is applied (exactly). In this work, we extend (43 by showing that 
stability is retained even in the presence of controller approximation errors. We consider a number of general classes 
of controller approximation error, including deterministic and probabilistic errors and also controller sample and 
hold errors. To the best of our knowledge, there has been no investigation on the stochastic stability of (nonlinear 
stochastic) RHC in the presence of controller approximation errors. 

* This work was supported by AFOSR/AOARD via AOARD-144042. 

Bertoli is with the Australian National University (ANU) and NICTA. He is supported by NICTA 

^A.N. Bishop is with the University of Technology Sydney (UTS) and NICTA. He is also an adjunct Fellow at the Australian National University 
(ANU). He is supported by NICTA and the Australian Research Council (ARC) via a Discovery Early Career Researcher Award (DE-120102873). 



In this work we also outline an approximation method for computing the optimal RHC for nonlinear stochastic 
continuous-time systems. This method is based on Monte Carlo integral approximation and originates in the work 
of Kappen [15] [161 where such techniques were applied in finite-horizon optimal control for nonlinear stochastic 
systems. The broad idea is that a solution to the Hamilton-Jacobi-Bellman partial differential equation associated 
with a typical optimal control problem [TO] can be formulated in terms of an expectation over a stochastic trajectory 
defined by an uncontrolled stochastic differential equation (SDE). Indeed, this relationship between partial differ¬ 
ential equations and so-called path-integrals is just a consequence of Feynman-Kac’s formula [10]. This expectation 
(or path-integral) can then be approximated via Monte Carlo simulation, and since this defines the solution to the 
Hamilton-Jacobi-Bellman equation, it is a short leap from there to the optimal controller (or its approximation). 
This numerical algorithm for optimal control has received interest in, e.g., (40) |38l |36l |4TJ[32l where a number of 
generalisations (and applications) have been investigated. To the best of our knowledge, no investigation of the 
stochastic stability of this approximation method has been considered. 

A contribution of this work is the analysis of this Monte Carlo based controller approximation method, as it 
applies to RHC of nonlinear stochastic continuous-time systems. In particular, we relate this approximation method 
to the more general stability analysis provided in this work and we show that this method stabilises the system (in a 
specific sense to be defined). This stability analysis justifies application of this control algorithm over an extended, 
possibly infinite, time interval. 

The remainder of this work is organised as follows: In Section 2 we outline the basic nonlinear stochastic RHC 
problem and some related notation. In Section 3 we consider the stability of the nonlinear stochastic RHC regime. 
In particular, we note the stabilisation properties of the optimal (ideal) controller and we analyze the stability prop¬ 
erties of a number of controller approximation methods. In Section 4 we introduce the Monte Carlo based algorithm 
for controller approximation and we relate this algorithm and its stability properties to the results given in the pre¬ 
vious section. In Section 5 we provide some concluding remarks and comment on a number of possible extensions. 


2 Nonlinear Stochastic Receding Horizon Optimal Control 


Let (O, S', P) be a complete probability space equipped with the natural filtration [St) t^o generated by a fixed, 
standard, Wiener process Wt[co): [0,oo) xD. We consider a nonlinear controlled process X^’^°'^((t)): [0,oo) x 

^^o.xa,u ^ Ut)dt+g[X°’'‘°’“)dWt (1) 

with = xo e P“. We assume /: P“ x P"’ — P" and g: P” —► to be continuous. The stochastic integrals in 

this paper are to be read in the Ito sense [T) . Moreover we assume that 

|/(x,M)-/(y,M)|-i-|g(x)-g(y)| ci|x-y|, V(x,y, u) eP" x P" x [/ 


|/(x, u)- f[x,v)\^ci\u- v\, y[x, u, v) e U" X U X U 


for some finite constant ci > 0. Let t ^ 5 ^ 0, then the superscripts X^’^’“ denote that the initial state at s ^ 0 is x 
and the control history is [ut)t^s- 

Fix a time interval [to, ti]. Then a control udcj) : [to, ti] x O ^ (7 is said to be admissible if it is Borel measurable 
and 


e[ f ' |u(Xs‘’’^’“)|'?<i5 

Mfo 


< oo. 


VxeP”, q^l 


We denote by the class of admissible controls on [to, h]. These conditions are sufficient for the existence of 

a unique, continuous, (strong) solution to the stochastic process; e.g. see [T]|39l. 

Here we consider control and stabilisation to (a neighbourhood of) the origin; any other desired set point can 
be substituted via a simple change of coordinates. To this end, we fix /(O, t<(0)) = 0, i.e. the origin is an equilibrium 
point for the nominal deterministic system. 

Let r > 0 be fixed. We associate with the following receding horizon cost functional 


w[t,s,x, u) E 


ip{X 


t+s,x,u 

T+t 


) + 



£ ^j^t+S,X,U 


Ur)dr 


for s E [0, T] 


where ^: P” x P"* ^ [0,oo) and 0: P” ^ [0,oo) are non-negative continuous functions that satisfy 


^ (/>(M ^ Call-I-|x|^), VxeP“ 


and 

C 2 [\x\^ + |w|^) ^ u] ^ Call -I- |x|P -I- |m|^), V(x, u) e P” x 1/ 
for some finite (independent) constants ca, Ca > 0 and p^l. Further, 0(0) = 0 and £ (0, u(0)) = 0. 





We define a value functional as 


v{t, s,x] 


inf w{t,s,x,u) 

Ur^‘^\t+s.ti-T] 


inf E 


+ f 

Jt 


T+t 


nx, 


t+s,x,u 


Ur)dr 


( 2 ) 


and denote by Uyix), if it exists, the optimal control, i.e. the admissible control process over the finite horizon 
[t,T+ t] that minimizes m. In (one-step) RHC it is necessary just to compute Ur{x) for r - t at which point the cost 
functional (and thus the value functional) changes to capture the receding horizon. 

The value functional is time-invariant with respect to the first argument, in the sense that 


v{t,s,x) - 


inf E 

«r£'^[t+s,t+r] 


inf E 

«r£'^[s,r] 


pT+t 

+ I ,Ur{x)]dr 

^(^5,x,«) ^ Ur[x)]dr 


- v{Q,s,x) 


Note that when viewed over s e [0, T] the value function represents the so-called value-to-go over the fixed finite 
horizon [f, T -i- f]. Going forward, we often write v{x) in place of v{t, 0, x) or v{s, x) in place of f (t, s, x) when dealing 
with the value-to-go function. 

With the modelling hypotheses adopted thus far, we have the following key lemma. 

Lemma 1. There exist a pair of positive constant C 4 , C 5 , depending only on p, T, Ci, C 2 , C 3 , such that 

C4|x|P ^ fix) ^ C5(l-t |x|P), VxeIR” ( 3 ) 


and thus vix) —► 00 with |x| ^ 00 . 

Proof. The proof of the lemma is given in the appendix. □ 


Going forward we write dxV(x) and dxxU[x) for the Jacobian vector and Hessian matrix respectively. Further¬ 
more, 

dv(s,x] d 

dsv(s,x]:= -= — inf E (p{X: 

ds ds Ure'2f[sri 




e[Xp^'“,Ur[x))dr 


where dsV{s,x) is defined on 5 e [0, T] for any t ^ 0. We often write dsV(x) with s e [0, T] in place of dsV[s,x) for 
brevity. 

Under certain conditions, at any time s e [0, T] the following Hamilton-Jacobi-Bellman (HJB) equation can be 
associated with the general value functional 


-dsv{s,x)- inf [e{x,u) + f{x,u)' dxv{s,x) + \x.t[g[x)g[x)' dxxv{s,x)]\ 


with a terminal boundary condition v[T,x) - (p{x). This association is in the sense that a suitably smooth solution to 
the HJB equation, if it exists, coincides with the value-to-go [ini . Note that in (one-step) RHC we are only interested 
in the solution of the HJB equation v[s, x) at time s = 0 on the interval s e [0, T], 

We note that assumptions introduced in this work are assumed to hold from the point at which they are intro¬ 
duced throughout the remainder of the work. The modelling hypotheses, e.g. on the functions /, g, (p, (, etc. are 
assumed to hold throughout the remainder until the point they are refined (typically specialised) and from which 
point the refinement is supposed to hold. 

Assumption 1. We assume that the modelling assumptions outlined to this point are augmented (where and how 
necessary) to ensure v{s, x): [0, T] x |R” ^ [0, 00 ) is once continuously differentiable in s e [0, T] and twice continuously 
differentiable in x for all x e IR” \ {0} and that v{s,x] is a solution to the corresponding associated HJB equation. 

Sufficient conditions for this assumption to hold (in addition to the modelling hypotheses introduced thus far) 
are given in, e.g., [10] . Typically, these sufficient conditions take the form of further boundedness or regularity 
assumptions on the system and cost functions and/or their (partial) derivatives and are not overly restrictiv^ll We 
could also move from considering classical solutions of the HJB equation to generalised or viscosity solutions [101 • 

^ It is noteworthy that while a classical solution to the HJB equation arising in deterministic optimal control is not typical, it is well-known 
[T9][ISlIin!lll] the stochastic optimal control problem is quite generally ‘more regular’. Indeed, under an assumption of uniform parabol- 
icity, i.e. uniform positive-definiteness of g(x)g(x)^, it generally follows that a classical (unique) solution to the HJB equation will exist in the 
stochastic setting (under mild regularity assumptions on the model/cost); see Chapter IV4 in [To] or Krylov (20]. Under the modelling hypothe¬ 
ses introduced in this work, uniform parabolicity follows if g(x) is full-rank (there is no real loss of generality here) and if g(x) is bounded below 
(which is compatible with the hypotheses thus far). Separately, with an added Lipschitz assumption on the cost (compatible with the hypotheses 
herein), a classical solution to the HJB equation not only exists but is indeed Lipschitz [43[4] . This Lipschitz setting is commonly assumed when 
studying the characteristics of stochastic optimal control; e.g. see 1431 . Note, we do not generally require (or ask) for this Lipschitz property here. 
In any case, assumptions of this type concerning the existence of a classical solution are common in the analysis of both deterministic min 
and stochastic optimal control m. 












Given the admissible optimal control process Ms(jc) over the finite horizon 5 e [0, T] then the optimal value func¬ 
tion over the finite horizon from any r ^ 0 to T - 1 - f is 


v{x) = E 


“) + 


I 


T+t 




and, given Assumption[T] the value-to-go satisfies the following HJB equation 


- dsU{s, x)=e (x, Usix)) -I- fix, Us{x))~''dxVis, x) -I- itr [g(x)g(x)''’d;c;c i;(s, x)] 


(4) 


on 5 E [0, T] with the terminal boundary condition viT, x) - (p{x). We will use the following assumption. 
Assumption 2. We assumedsvix)\s=t ^ 0 at any time t^O where se [f, T + t]. 

This assumption implies the optimal cost, when viewed at the start of a finite horizon, is increasing with de¬ 
creasing horizon lengthqS One way to interpret this is that if the horizon length is reduced then the control action 
has less time to stabilize the system and thus the terminal cost is likely to be greater even though the running cost 
might be reduced. 

Finally, we highlight again that in optimal RHC, at any time t ^ 0, the applied control is just u* (x) = Tis(x)|s=o 
and the remaining, finite horizon, controls TZ^fx) over 0< s^T are discarded. 


3 Stability of Nonlinear Stochastic Receding Horizon Control and Robust¬ 
ness to Controller Errors 

It has been shown in (42) that nonlinear stochastic receding horizon control stabilises the system to the origin if 
the true optimal control is applied (and under comparable assumptions and hypotheses to those considered here). 
In this section, we generalise this result to the case in which approximations in computing the optimal control are 
naturally employeqS 

Going forward we write SSg ■- {x e K” : |x| ^ 5} for the 5 > 0 ball around the origin and we use the shorthand 
[u < c} to denote the level set {x e IR“ : i^fx) < c] for c ^ 0. For any 6^0 define ms := inf{c 0 | SSs ^ {i” < c}}- 
Informally stated, we study stability to {u < ms] where, from (1), it follows that {v < ms) is contained in a ball 
around the origin with radius tending to 0 as 5 —► 0. 

3.1 Deterministic Control Errors 

We introduce the following control signal M((x) e with 

|Uf (x) - u* (x)| ^ e, V(x, r) E R” X [0,oo) 

for some sufficiently small e > 0. We denote by trajectories of O driven by Utix) with Xq - Xq - xq. If 

Utix) u* (x) then Xt for all r 0; i.e. we recover the optimally controlled process in some suitable 

sense (to be made precise). The goal in this subsection is to prove that if |M((x) - u* (x)| is small then Xt behaves 
similarly to Xt- 

We will require the following assumption. 

Assumption 3 . There exists a constant cq > 0 such that\dxV{x] \ ^ cell -t |x|P), Vx e R”. 

For generality, we have stated our requirement that \dx vix) \ ^ ceil + \x\P) as an assumption. Nevertheless, results 
of this type, i.e. results concerning estimates/bounds of the derivative of the value function, are well studiecfl and 
conditions on /, g, £, and (p under which this assumption is guaranteed to hold are readily available fT0ll20l . 

Theorem 1 . Suppose Assumptions\l\\2[and^ and the modelling hypotheses hold. Define fi 05(1 - 1 - ^) and A:- 
Solutions of the SDE 0 driven by the optimal control u* satisfy: 

• ifxQ e{v < ms] then, with probability one, will never exit {v < ms] andE\X^’^°’“ |P ^ Vr ^ 0; 

^This assumption is common and discussed further in (3^ and the references therein. It is proven to hold in under quite typical modelling 

constraints (compatible with the modelling hypotheses presented here). 

^In [ 43 , stability of the origin (under the exact controller) is studied under the classical requirement [T7| that g(0) = 0, i.e. that the origin is an 
‘exact’ equilibrium for the diffusion, and thus the noise ‘goes to zero' at this point. Here, we consider the practical case in which approximations 
are made when computing the optimal control, and we study stability to some neighbourhood of the origin. Thus, we also do not require g(0) = 0. 

^For example, it is proven in [To] that ^ ce{l + \x\P) holds under the given modelling hypotheses adopted in this work (on the 

cost/dynamics) with essentially the additional assumption \dx^{x, u)\ ^ c{l + \x\f^ + \u\^), c> 0. Thus, we already have the basic conditions 
on the dynamics/cost to ensure this assumption holds. See Chapter IV8 in Qo] and also Chapter 3 and Chapter 4 in [^. Even stronger results 
have been proven no][^ implying this assumption holds trivially when one begins imposing Lipschitz conditions on the cost £ (x, u). 





ifxo ${v< mg}, it holds 


E|X°’^o,«*|P + msy Vr^O, 

and, with probability one, hits{v < mg] infinite time. 

These two points imply that almost all solutions to 0 driven by the optimal control u* are exponentially stable to 
a ball around the origin and almost all trajectories remain within this ball. 

Moreover there exists e > 0 such that if 

Q<e<e and \ut{x) - uj {x)\^e, VxeK” 

then solutions of the SDE 0 driven by the control law Ut satisfy the following: 

• ifxo e{v< mg], then with probability one, never exits {v < mg] andE\Xf’^°’'^\f’ ^ Vf; 

• ifxQ t{v< mg], there exists a constantX ^ > 0 such that 

E|x“’^o,a|P ^ A ^pe~^^‘\xo\P + mg], Vr ^ 0, 


and the constant 6c satisfieslimc^o dc - A. Further, with probability one, hits{v < mg] infinite time. 

These two points imply that almost all solutions to 0 driven by the approximate control % are exponentially 
stable to a ball around the origin and almost all trajectories remain within this ball. Such solutions converge expo¬ 
nentially fast under tit hut slower than under u*. In the limit e ^ 0 we recover the stability properties of the optimal 
controller. 

Proof. Going forward, we use the shorthand Xt for the process 0 driven hy ufiXt). We prove only the second half 
of the theorem concerning the approximate controller Uf. Statements on the optimal control follow with e = 0 . 

Under the hypotheses of the theorem it follows that cfixl^ ^ v{x) ^ csfl + |x|P) for some C 2 ,C 4 ,C 5 > 0 . Let iif 
denote the infinitesimal generator (TT) of . Then 

fix, ujix))'''dxV-t \tx[g[x)g[x)^ dxxv] 

is a function of x e K” derived by applying the infinitesimal generator to p(x). From 0 , we have ^ v - -£{x,u*[x))- 
x)|s=f which by the modelling hypotheses and Assumption|2]is strictly negative definite T£v < -C 2 |x|P for all 
X E IR” \ { 0 } and 5 £v -0 eA x-Q. 

Now, Ito’s formula yields 

dviXt) = {fiXt,Uti%)),dxViXt))dt+\tx[giXt)giXtfdxxViXt)]dt+{giXt]dWt,dxViXt)) 

Adding and subtracting (fiXt, u* iXt]),dxViXt)), and using the HJB equation 0, we obtain 

dviXt) = £^v[Xt)dt-t{f[Xt,Ut[Xt))-fiXt,u*[Xt)),dxV[Xt))dt-t (giXt)dWt,dxViXt)} 


Set 

££v[x)T£vix) + {fix, Utix))- fix, u* ix]),dxvix)) 

where if p(x) is the infinitesimal generator of Xt applied to p(x), for any x e IR“. 

Recall that the infinitesimal generator is a purely local construction [17] which allows us to consider T£vix) and 
^p(x) < -C 2 |x|P at the same point in space-time. 

By using the Lipschitz condition on fix, u) and the Cauchy-Schwartz inequality we obtain 

T£vix) ^ ^vix]-h ci\dxv\\utix] - u* ix]\ 

< -C 2 \xf +eci\dxV\ 

if -C2|x|P-i-eciC6(l-i-|x|P) = (-C2-i-eciC6)|x|P-teciC6 


We define 


ae 


C2- (—-tDeciCe 
oP 


and 






There exists e > 0 small enough so ae >0, Ve < e. Moreover we see that lime^o ac - C 2 ^ lime^o 0e = A. Going 
forward we write a = for simplicity. It is easy to check that on {x e B?” : |x| > 5} we have 


vix) ^ f\x\^ and ££vix) ^-a\x\^ 


(5) 


at 

We define := v{x)e ^ . Then, on the set {x e B?" : |x| > d}, it follows 


(6) 


where ^V{t,x] is the infinitesimal generator of Xt applied to y (f, x), for any xElR”att^0. 

Assume now xq e {t; < ms]. Given f ^ 0, define the stopping times 

Ti inf{s ^ 0 I X5 C {t; < m^}} A f and T2 := inf{s ^ ti | X5 e {i; < A t 

i.e. ti,T 2 are, respectively, the first exit and re-entry time of the process Xt in{u < ms) before t. Going forward we 
write in place ofV[t,Xt] for simplicity. By definition, for any h, we have 

V[Xt,)-V{Xtt)^ f‘'^(Xs)ds+ 

Jtl Jtl 


t — S T 

We know eP dxV(Xs) g{Xs)dWs is a martingale. Then, the optional sampling theorem [9] implies 


E[V{Xr,)-ViXrt)]=E \ n^V(Xs)ds 

Uri 


^0 


where the last inequality follows from (6). We also note that, by definition, V(Xri]. Therefore we have 

V{Xr 2 ) = almost surely and consequently ti = tz almost surely. Thus, if the process starts in the set {y < ms] 

it can never exit this set. It follows that 

E\Xtf < -E[vi%]] ^ — 

C 4 C 4 

and proof of the first point is complete. 

Now assume xot {v< ms], fix t and define the following stopping time 


T := inf{s ^ 0 I Xs E {y < ms]] A t 


Note that we have already considered the case t^T=>XfE{y< ms]. Now write 

V]Xt] = v{xo) + V{Xr) - vlxo) + V]Xt) - V{Xr) 


and take the expectation of both sides. Arguing as before E[V]Xr] - v{xo]] ^ 0. Moreover 

E[V{Xt)-V{%)\ = ^ msE^e^'- ^ mgeP 

Hence, using the two inequalities just shown, 1(5) and (3) we have 

^ 1 6 ^ 6 ^ 2l f B —9Lt TTl/i 

E\Xt\P < -E[L;(Xf)] = -E[l^(Xt)] -(y(xo) + < -\xofe l>+ — 

C 4 C 4 C 4 C 4 C 4 

and proof of so-called exponential p-stability is complete. 

Now, we have already shown E - l^(xo)] -EUq .^V{Xs)ds] ^ 0 which implies E[L’(XT)e^’^] ^ f(xo). Fur¬ 
ther, 

E[v{Xr)e^^] = E[v]Xr)e^nir^t} + v{Xr)e^^lir=t)] > E[v]Xr)e^nir=t}] > mseT'^PiT^t) 

and thus mse^*P{T = t) < yfxo) for all t. This implies P(t = r) ^ 0 as t ^ oo which in turn implies that P(t < oo) = f. 
From this and inequality JG) it follows that V (t, X^) is a positive supermartingale. From Theorem 5.1 in [17] it follows 
that V (t, Xr) converges almost surely to a finite limit (dependent on xq) as f ^ oo. Then, from (5) we have 


IXrlP^ 


(SUPf I/(T,X^r)) 

- e p 

C4 


with probability one. Letting t ^ oo proves that almost all solutions converge exponentially fast toward {v < ms]. 
Results of this type are known [l71, i.e. where p-th moment exponential stability implies almost sure exponential 
stability. □ 






3.2 Probabilistic Control Errors 

We now turn our attention to the case where the perturbed controller has a Gaussian distribution, 

Utix] ~ 

and the evolution of the nonlinear controlled process : [0,oo) x O ^ |R” follows 

with the existing modelling hypotheses holding. Here h : R” ^ jg continuous with 

\f(x) - /(y)| + |g(x) - g(y)| + \h(x) - h{y)\ ^ Ci|x- y|, V(x,y) e R" x R” 

for some finite constant ci > 0. We denote by trajectories of Q driven by Ut{x) with Xq-Xq- xq. 

In this subsection we seek a result analogous to Theorem [1] under the proposed probabilistic controller error 
model. The goal is to show that if 2(x) ^ 0 for all x e R” then Wf (x) ^ u* (x) and Xt for all t^O; i.e. we 

recover the optimally controlled process in some suitable sense. 

As before we need a further assumption on the derivatives of the value function. 

Assumption 4. One of the two following condition holds: 

• there exists a constant cj > 0 such that\dxxn{x] \ ^ cyll + |x|P“^), Vx e R”; 

• there exists a constant c-: > 0 such that\dxxV{.x) \ ^ cyll + |x|P), Vx e R” and h{x) is bounded. 

As with Assumption |3] we have stated our requirement that |d;c;cL'(x)| ^ cy(l + |x|P) as an assumption (for the 
sake of generality). Yet similarly again, results of this type, i.e. results concerning estimates/bounds of the second 
derivative of the value function, are well studied in the literatur^ fT0ll20l . 

The following is the main result of this subsection. 

Theorem 2. Suppose Assumption\l\\^and\^and the relevant modelling hypotheses hold. Define f ;= cs (1 + ^) and 
A ^. Solutions 0/0 driven by the optimal control u* satisfy the relevant convergence results in Theorem[l\ 
Moreover, there exists e > 0 such that if 0<e<e and the following holds Vx e R” 

• Utix) ~ A^{u*{x),I.{x)); 

• 0 < Six) = 2(x)^ and for any norm |2(x)| ^ e 

then solutions of the SDE 0 driven by the approximated controller Ut satisfy the following: 

• ifxo e{v< mg], then with probability one, never exits [v < mg] andE\Xf’^°’“\P ^ Vf; 

• ifxQ €{v < mg], there exists a constantX ^ 0^ > 0 such that 

E\X°’X0’^it)\P ^ — (/e“®^*|xo|P + mg], Vt ^ 0, 

andOe obeyslime^ode = A (where X is the convergence rate of the optimal control; see Theorem^. Further, with 
probability one, x°’^“’“ hits {v < mg] infinite time. 

Thus, almost all solutions to 0 driven by the approximate control Ut are exponentially stable to a ball around 
the origin and almost all trajectories remain within this ball. Such solutions converge exponentially fast under Ut but 
slower than under u*. Ase ^0 we recover the stability properties of the optimal controller. 

Proof. As before, denote by Xt the process Q driven by the approximated control Ut(Xt). We quickly find 

dXf - f{Xt)dt+ h[Xt)u*dt+ h[Xt)[ut - u*)dt+ g[Xt)dWt 

Since SCx) is (symmetric) positive-definite we have - 2;(x) where 1.^^^ exists and is unique. Then Ut{x) ~ 

J{{u* (x),5;(x)) implies [ut - u*)dt = where Yt is a standard Brownian motion (!]. The two Brownian mo¬ 

tions Yf and Wt are realised on two different spaces: we have already fixed Q and we denote by O' the space as¬ 
sociated with the probabilistic controller approximation such that [Wj, YJ]^ defines a fixed Brownian motion on 
O X O'. 

^As with Assumption|3]it is proven in [lO] that \dxxv(x)\ cy(l -I- \x\f) holds under the modelling hypotheses adopted in this work (on the 
cost/dynamics), with essentially the additional assumption that |(?xf (x, u)\ c(l + |x|P + | u\P) and ldxxt(x, u)\ c(l + |x|P + | u\P), c> 0. See 
Chapter IV9 in [lO] and also Chapter 3 and Chapter 4 in [23 . Again, stronger results have been proven [l0]|^ implying this assumption holds 
trivially when one imposes Lipschitz conditions on the cost /"(x, u), which is common in similar analysis EH- 



Let v{x) = E[(/)(Xj’) + Jq £{Xs, u*)ds] where the process Xt defining v{x) is defined hy (7) driven with the optimal 
control u* (x). We consider 

if t; = </(x) + h{x)u*,dxv) + ^tr[g{x)g[x)~''d^xv] + ^tr[I.[x)h[x)h[x)~'' dxxv] 

where if l;(x) is the infinitesimal generator of Xt applied to v{x), for any x e K” at t ^ 0. Again, if l;(x) should he 
viewed as a function of x e K”. 

We know that 

{f{x) + h[x)ut,dxv) + -ti[g{x)g{x)~''dxxv] <-C2\xf 

from the proof of Theorem[T] i.e. if yfx) < -C 2 |x|P. Owing to Assumption|4]we have, for some positive constant c, 

^tr[I.{x)h{x)h{x)'^dxxv] < ec(|x|P + 1) 

and therefore, 

if fix) ^ - C 2 |x|^ + ec|x|P + cc 

Define ae C 2 - + l)cc and the proof now follows exactly that of Theorem [1] and we omit the repetition for 

brevity. □ 

3.3 Mixed Type Errors and Sampled Control 

We now state a simple corollary that takes into account a mixed prohahilistic and deterministic controller error. 

Corollary 1. Suppose we are working under 0 and Assumptions\^to\^and the modelling hypotheses outlined thus 
far hold. Define fi := Cs (1 + ^) and A ^. There exist ei,'^ > 0 such that ifO <ei<ei and 0 < £2 < ^ and 

• [Ut{x)-u*{x)]~jY[p[x),l.{x)], 

• |p(x)Kei, 

• 0 < Six) = 2(x)^ and for any norm |Z(x)| ^ £ 2 ; 

holdsM X E K”, then solutions of the SDE 0 driven by the approximate control law tit satisfy: 

• ifxo e{v< mg}, then with probability one, never exits {v < mg} andE\x'l'^°’^f’ ^ Vf ^ 0; 

• ifxQt{v<mg],pute-{e\,e 2 ]. There exists a constant A ^ 6c > 0 such that 

E|X°'^o.«|P + V r>0, 

and de obeys lime^o 6c - ^ (where A is the convergence rate of the optimal control; see Theorem^fy). Also, with 
probability one, X^'^°'“ hits {v < mg} in finite time, i.e. almost all solutions converge exponentially fast toward 
{v<mg}. 

Proof. If /i(x) E PSci = {x E K” : |x| ^ £ 1 } then {ufix) - u* (x)) ~ ,yK(/i(x),2(x)) implies (Uf - u*)dt- p{x) + l,{xfi^'^dYt 
where Yt is a standard Brownian motion. It is then easily seen that the error is split in two parts, one part formed 
hy the added Brownian motion and the other part formed hy the deterministic error p{x) with |/i(x)| ^ £ 1 , Vx e IR“. 
The proof of hoth Theorem[T]and[3apply readily in this case and we omit the details for brevity. □ 

In many practical scenarios it is impossible to compute the optimal control instantaneously and one must in¬ 
stead resort to a sample and hold approach to control whereby the control is computed at discrete-time increments 
and held constant in between such times. Stability results for such approaches have been considered, e.g., in deter¬ 
ministic settings (28) and stochastic settings (^. We now provide a related stability result. 

Proposition 1. Consider the more general controlled process 0 and suppose Assumptions^ |2 and the relevant 
modelling hypotheses hold. Suppose that under a given control law Utix) the solution to the SDE 0 with 

initial condition xq satisfies 

i^ce~^^\xafi + m, \l tf^Q (8) 

for some positive constants A, c and m. Now fix a time step A > 0 and let the time interval t e [0,oo) be discretised 
according to to -0, ti = A, t 2 - 2A,..., tj; = fcA. Consider the control law defined by ufixt) - Ut^{xtf) for te [f^, fj;+i), 
i.e. the control Ut is held constant over small time intervals with a value given by the control Ut at the beginning of 
each interval. Then there exists a constant step size A > 0 and constants Mi, M 2 > 0 such that 

E|X°’^°'“(r)|Ps^Mie“7^|xo|P-tM2, V f^O 

for alio < A and where we denote by X® “ trajectories 0/0 driven by Uf (x) with Xq = Aq = xq. 


Proof. Consider the two stochastic differential equations of the form (T) but driven by the two different controls 
defined in the statement of the theorem 


^X°,xo,u ^ 


Then, since both processes share a common initial point, it is straightforward to show that the two Euler-Maruyama 
time-discretisations of both processes are identical. That is, by induction on A: e N we have 


1 ^Q,XQyU 


^0,Xq,U ^ ^^^0,xo,ii 
^0,Xq,11 ^ ^^ ^0,Xq , u 


ut,)A + g{Z°’^°'“, ut.WA 


where Wa ~ ^(0, A ■ I) and Zq-Zo-xq. 

There is a known result [m which states that p-th moment exponential stability of a stochastic differential 
equation implies p-th moment exponential stability of its Euler-Maruyama simulation and vice-versa (if the time- 
step A > 0 is sufficiently small). Thus, with minor modifications to the main result of [121 it follow^ that if (8) holds, 
then for a sufficiently small step size A, the Euler-Maruyama approximation of satisfies 


E\Z°’^°’'^\P^c\xo\Pe~^^* + M, Vt^O 


for some M > 0. The same holds for z^'^°'’^ as this discrete-time process is identical to z^'^’’^. Again, with slight 
modification to the results in (ID it follows that if A is small enough then 


E\X°''‘°'^\P i^Mie~^^\xo\P+ M2, Vt^O 


for some positive constants Mi, M 2 . This completes the proof. □ 

A straightforward consequence of Proposition [1] is that the convergence results given thus far concerning the 
various controller approximation errors wiU continue to hold even if the control is computed only at discrete-time 
instants and held constant in the interval between such instants (provided that the time elapsed between each 
updates is small). 

The next result brings everything together. 

Corollary 2. Suppose the assumptions of either Theorem[l\ Theorem\^or Corollary^hold. Suppose also that Ut is an 
approximately optimal control law satisfying the requirements of the respective result; e.g. \ ut- u* \ < e <e in Theorem 
[l\etc. Fix 6 >0, we know that there exists a constant Oc > 0, satisfying the statement of the respective result, such that 

E\X°’^'>’"{t]\P^ — {^Pe~^‘^\xof + msy V f^O 

Now suppose that Utix) is computed at discrete times t^ with to = 0, q = A, t 2 = 2A, ..., = fcA and held constant 

on the interval t e [q, q+i) as described in Proposition[l\ Then there exists a constant step size A > 0 and constants 
Ml,M 2 > 0 such that 

^xO,xo.u^t)\P Mil5e~T*\xo\P + M2ms, V t^O 

for all 0 < A ^ A. 


4 Monte Carlo Methods for Approximately Optimal Stochastic Control 

In this section we outline an approximation method to compute the optimal nonlinear stochastic RHC. This method 
relies on simulating a stochastic process that is related to the original controlled system but that is independent of 
the control signal. The approximation method outlined in this section was first considered by Kappen [15l[T^ for 
finite-horizon optimal control and then subsequently studied, applied, and generalised in, e.g., [TOIIMII^ItTI I^I^ . 

Recall that we are considering the nonlinear controlled process Xj’^“’“(<y): [0,oo) x O —► IR” defined by 

^X°,xa.u ^ f^x°’^°’'^]dt+ h{Xf'‘°’“]utdt+ glX^’'‘°’“]dWt (9) 

®The result in 02] must be modified since here we consider exponential stability to a ball of the origin (not the origin itself as in El)- Thus, 
instead of the strong result of E] , we merely want exponential stability to the ball for a SDE to imply exponential stability to a (possibly different) 
ball for its Euler-Maruyama simulation (and vice-versa). The fact this relaxation is true follows easily (intuitively) given the strong result in El- 
It is causally unsurprising. Details on the modifications required to relax El stated are available upon request (but needlessly distract the 
proof otherwise). 



with the existing modelling hypotheses holding. Here, h{x) and g[x) (which may he non-square) are assumed (with 
no real loss of generality) to have full rank. Note, h{x) full rank implies the existence and uniqueness of a left- 
inverse, i.e. a function h~^{x): K” —>• such that h~^h{x) = I, Vx £ K”. Associate with m the following receding 

cost functional 


w{t, s,x, u) E 




t+s,x,u 

t+T 


) + 


L 


t+T 


juJRUr -I- 


t+s,x,u 


)dr 


at any time r ^ 0 with 5 £ [0, T] and where the cost on the control input is now quadratic and R £ is a constant 

positive definite matrix. We define the value-to-go functional as 


v[t,s,x):- inf w{t,s,x,u) - inf E 

Ure'®[t+s,t+T] Ure^[t+s,t+T] 




f+r 


^uj RUr 


+nx, 


t+s,x,u 


)dr 


( 10 ) 


where ^[t+s,t+T] is the set of admissible controls in the interval [t-i-5,t-i- T], 

The HIB equation associated with the value functional fTOl is 

-dsV{s,x] = ird |^(x) -i- Ru+ [fix) + hix]u\^ dxvis,x] -i- ^tr[g(x)g(x)^dj;;ci;(s, x)] j 


with a terminal boundary v[T,x) - (p{x). The optimal control on the interval defined by s £ [0, T] is just m*^j(x) = 
-R~^h{x)~^dx vis, x) for all x £ K”. In (one-step) RHC we are only interested in the solution vis, x) at s = 0. We have 

u* ix) --R~^ hix)~^dxvix), Vx£|R” 


Substituting the optimal control back into the HfB equation gives 

-dji/ls.x) -£ix) - ^idxVis,x))~'^hix)R~^hix)^dxVis,x) -i- fix)~'^dxvis,x) -i- itr[g(x)g(x)^dj:j;i;(s,x)] 


which is a nonlinear partial differential equation. However, we note the following log-transform of vis, x) 


y/is,x] = exp 


-vis,x) 

7 


for all X £ IR”, s £ [0, T] and for some finite y > 0. This transform arises in a number of stochastic control scenarios 
nni- We often write i//(x) in place of ig(0, x). We note the following required assumption. 

Assumption 5. We assume that there exists y £ IR such thatjhix)R~^ h(x)^ = g(x)g(x)^. 

This assumptiorQ is standard in the path integral formulation of optimal control (161, but it also appears more 
generally in the stochastic optimal control literature (TOl. This assumption allows us [10] [16] to write 


- dsV/is, x) = -;^f'(x)ig(s, x) -I- fix)'^dx'\lfis, x) -I- itr [gix)gix)~''dxxy/is, x)] 


which is a linear partial differential equation on [0, T] with terminal condition y/iT,x) - exp [-(pix) /y]. It now follows 
by the Feynman-Kac formula that the solution to the above PDE at (0, x) is given by 


ij/ix) = E 




)- 


-f 

jJt 


T+t 


eiz!’^)ds 


where now Zg’^iai ]: [t, T -t t] x O ^ IR” is a nonlinear (uncontrolled) process satisfying 

dZp^ = fiZ^’^)ds+giZ^’^)dWs ( 11 ) 

with initial condition Z^’^ = x. Note that 

M*(x) = -R~^hix]~^dxVix] - yR~^ hix]^dxiogtf/ix] 


Now, given the solution for ig(x) derived via the Feynman-Kac formula, it is informally straightforward to devise 
a Monte Carlo approximation for the control; e.g. one can first simulate sample paths of ITTl . then form a Monte 
Carlo approximation of the integral for ig(x), and approximate the spatial derivative of ig(x) via differencing. Going 
forward we explore a more formal Monte Carlo approximation circumventing the need for crude numerical (spatial) 
differentiation. Firstly, we need the following result. 

^This assumption is satisfied in many applications of stochastic control; e.g. in machine learning and robotics [T6|[40][38][36l|4T][ll- This 
assumption requires the dimension of the noise and control to be equal and for the noise and control to act on the same subspace. Then, the 
cost of control can be related to the noise variance as shown. The interpretation of this relationship is that along directions where the noise 
variance is small, the control is deemed more expensive while, conversely, in those directions in which the noise has larger variance the control 
is cheap M- Indeed, this may be desirable in practice since it forces control energy to be spent mostly in those directions in which the noise 
level may be problematic m. 












Proposition 2. Suppose Assumption^anc^ and the modelling hypotheses hold. Then 


u*[x) - -R ^h{x) 


dxV[x) - lim-- 
r —0 r 






(12) 


where the expectations are integrals over paths defined by the SDE 07]) with initial condition Z\’^ - x. 

Proof. This result appears in with h{x) = g(x) and it is straightforward to generalise. □ 

The controller form in Proposition |2] (and variations of such) is often referred to as the path integral formula¬ 
tion of optimal control [16] . At this stage, it may appear as though the reformulated optimal controller has heen 
significantly complicated. However, the optimal control as given in Proposition |2] is well suited to Monte Carlo 
approximation. 

The Monte Carlo approach to RHC is given by Algorithm[T] Note also that we consider two time-discretizations 
defined by Ai > 0 and A 2 > 0 respectively. The first, Ai, captures the sample and hold application in which the 
control is computed at discrete-time steps and held constant over those intervals; i.e. we approximate the optimal 
control u* (x) by Utixt) - Utj^ [xtfi over f e [fj;, t^+i) - [fcAi, [k - 1 - l)Ai). We denote by trajectories of (9) with 

Xo = Xo = Xo driven by ufixt). The second time-discretization, A 2 , is found solely within Algorithm[T]and defines 
the time-step employed during the numerical simulation of (TT) used to actually compute Ut^ (Xf^) at each 


Algorithm 1; Optimal Control Approximation via Monte Carlo Simulation 


Given at time t=0: 

1. Model hypotheses: /(■), g(-), h(-), (/>(■), f (■), Jt, T, and 7 . 

2. Initial starting point: xq e R". 

3. Discretization of time 16 [0,c») via fo = 0, fi = Ai, f 2 = 2Ai,..., fj; = fcAi. 

4. Discretization of the interval [0, T] with step-size A 2 such that T/A 2 = £ N. 

5. Parameter approximating the limit r > 0 such that r/Az = i? e N. 

Available at time fj. 

1. Current state: xtj. e R”. 

At time fj. do: 

1. Simulate N times the following discrete-time approximation of 




over ^5 e {0, A 2 ,..., 5 A 2 ,... KA 2 } where ~ ^ (0, A 2 ■ I). Simulation can be parallelised. 

2. Let 

O.Xf. 0,Xf. ^ 0,Xf, 0,Xf, 

Zq,k U)-={Zo = *"(01 

be the ordered set of sample points along the simulated discretised trajectory on the simulation run. 

3. For each sampled trajectory ie{l,...,N} compute 


0 ,xf. 


1/^2 


W{i)=^h ^ (!)) ((/) - (/)) 


Ao Ao 

where Wj ^ {i) are the sample points of previously to generate the trajectory Zq.^^^ (z). 

4. For each sampled trajectory ie{l,...,N} compute 


5. Compute 


Tj{i)=(p{z^ y ({z! *^(!))A2 

1=0 


.... 1 ^1 W(i) 

= Tlv- , 1 L exp[--7;(i)] — 

exp[-4:7(;)] i=l r 


which gives a (naive) Monte Carlo approximation of the optimal control. Let Uf(Xf) = utj^{xt^) over “ [kAi,ik+ l)Ai). 


This algorithm is easily implementable. The numerical approximation of the stochastic differential equation 
E) is known as the Euler-Maruyama method and is the simplest numerical scheme for approximating stochas¬ 
tic differential equations. This numerical approximation may be generalised [18] although care must be taken to 











ensure that sufficient gains warrant the sharp increase in complexity that accompanies higher-order numerical ap¬ 
proximation schemes. 

The error in computing the approximate control signal at the discrete time sites is a mix of the error introduced 
due to the Monte Carlo sampling (known as the statistical error) and the error introduced due to the approxima¬ 
tion of the limit and the time-discretisation (known as the discretisation error); see also iniia. At those specific 
discretised time sites we note the following result. 

Proposition 3. Suppose Assumption^andi^and the modelling hypotheses employed to this point hold. Suppose also 
that the system and value functionals are sufficiently regular. Given x e IR”, suppose Algorithm^ is used to compute 
Utix). Then there exists a positive constant p^^, a function p{x) satisfying \p(.x]\ ^ p/^^ ^ matrix l.(.x] such that 

'/N{ut{x] - u*f{x) - pixf] —► jV[0,I.[x)) 

where convergence is ‘in distribution’ with the number, N, of Monte Carlo runs; see Algorithm[l\ Also, limAj—o = 0- 

Proof. Let u* denote the optimal control defined hy (12) . For a fixed r > 0 approximate the limit defining 

jr p-t giZ^’^^ldWr 


^t,r - ^ ■ 




For r small enough we have 1 u* [x] - u* ^ (x) | < e with e to he chosen later. Let u* he the approximation to u* ^ found 
purely as a result of the discretized path approximation (associated with step-size Aa). Then, given the convergence 
results for the Euler-Maruyama method [18] |2], it follows that for A 2 small enough, there exists a constant c such 
that \ u*^{x) - u* {x) \ ^ cIS .2 for all r > 0. Using the triangular inequality, 


|u*(x) - Uf (x)| ^ cAz + e-ip^^ 


(13) 


Choosing e - A 2 yields limA 2 — 0 PAz - 0- We note that Uf (x) is a Monte Carlo approximation of u* (x). We denote hy 
Wo;ir a realised sequence of the discretized Brownian motion associated with the Euler-Maruyama discretization of 
(ID and hy Z^I^{Wo:k) the discrete path associated with it. Call IP the natural measure on the path space {Wq-.k}- 
Define 


G[Wo-,k) exp 


K-l 


(f>{Z°/m:K)) + E nzf^iWo:K))A2 
1=1 


and consider the path measure Q obtained by the relation 

J.O, G[Wo:k) 


Ep[G(Wo:K)] 


-dP 


Define the function FiWo-.K) = '^f =\(Wj - Wj-i), i-e. the sum of the first R Brownian increments. We have that 

rfif(x) = EqilflM/b:,;*:)] 

When simulating paths in Algorithm[D we simulate from the measure dQ := GlWoijcldP and use self-normalized 
importance sampling to compute r tit lx]. We know (5), that self-normalized importance sampling is asymptotically 
unbiased and moreover a central limit theorem holds if 


Here, we have 


/ 


[l-t-F^ll^] dQ)<oo 




1 


Ep[G(Wo:Jc)]2 


/ 


(l + F^]d<Q 


Moreover / F^d<Q = EpfF’^G] < 00 thanks to the fact that G is bounded. Therefore we have 

r\fN{ut{x]-u*{x]) M^(0,2(x)) 

where ^(^) - / (^] “ <Q[F)]^diQ. Convergence is in the sense of distribution with N. Now divide by r, add and 

subtract u* (x), call p{x] - u* (x) - u* (x) and use (13) to prove the convergence result. □ 













The asymptotic bias in the preceding error result can be reduced by decreasing A 2 or via a reduction in the 
horizon length T. The variance can be reduced by increasing N or through some variation of naive sampling such 
as improved importance sampling or additionally via particle methods and resampling schemes (5l[T6l[32] etc. The 
role of the parameter r with respect to the variance and the bias in the error approximation can be important; see 
(2] for a first study of this issue. Note also that 




Var^(F) 

Ep[G] 


and therefore the variance is intimately connected to Ep [G], i.e. the interplay between the cost and dynamics of the 
uncontrolled SDE. Such performance questions may be explored in future work; see also [37ll2l. 

Going forward with the analysis we use the following assumption. 


Assumption 6. Suppose that, for N big enough, {Ut{x) - u* (x)) ~ jV{p{x], -^JUx)] and'L{x) is bounded. 


This assumption is just an invocation of the central limit type of result in Proposition|3] (which states that with 
N increasing, the distribution of the random part of the control approximation can be assumed Gaussianfl. 

We can now state the main stability result of this section. 


Theorem 3. Suppose Assumptions[l\\^\^\^\^\^ and the modelling hypotheses outlined to this point hold. Define 
f'.- C 5 (l+^) andX\- Given xeW^, suppose Algorithmf^is used to compute tit [x). With Ai,A 2 > 0 small enough 
and N large enough, there exits A ^ 0 > 0 and a pair of positive constants Mi, M 2 > 0 such that 


£\xO,xo,n^t]\P <; MiPe~~^*\xo\P + M2ms, V 


andlimAi,A 2 ^o,iv —006 = A (whereX is the rate of convergence of the optimal control; see Theorem[l]l. 

Proof. Since the error Ut^. - u*^ is of the mixed type, we call on Corollary[T] From Propositionj^it follows that there 
exists A 2 small enough so that the deterministic part of the controller approximation error is small. Similarly, from 
Proposition|3]it follows that there exists N large enough so that the variance Z(x) is small. Assumption| 6 ]imposes 
normality on the error distribution. GoroUary[T]applies immediately. Picking Ai small enough to invoke Gorollary|2] 
completes the proof. □ 

It is practically feasible that such parameters, N (or A 2 ), could be made large (small) enough to ensure the 
variance (bias) in the approximation error is viable (and this is seen in practice where Algorithm[T]has been readily 
applied). However, the strong proof of the preceding theorem relies on the (approximating) assumption that Uf-u* 
is Gaussian. The error in this assumption, i.e. between the actual distribution and a Gaussian, decreases with 
increasing N, and can be quantified via Berry-Esseen type bounds (5] . In any case, the preceding theorem is rigorous 
under this assumption and, in practice, it provides justification for the stabilisation properties (seen in applications) 
of Algorithm[T]for large enough N. A stronger result with finite N would be difficult to obtain given that stochastic 
differential equations driven by arbitrary random processes (non-Brownian) are well beyond the scope of this work. 


5 Concluding Remarks 

In this work we explored the asymptotic stability of nonlinear stochastic RHG when the optimal controller is com¬ 
puted only approximately. We considered a number of general classes of controller approximation error including 
deterministic and probabilistic errors and even controller sample and hold errors. We also overviewed an approxi¬ 
mation method for computing the optimal RHG for nonlinear stochastic continuous-time systems. This method is 
based on Monte Carlo integration approximation and originates in the work of Kappen fTSlITHl . 

While we study the stability of various RHG approximations, we did not consider any measure of performance. 
For example, it would be of interest to analyze (path-wise) the running cost error that arises due to the approxima¬ 
tion of the optimal controller. Inverse optimality and optimality gaps as studied in would also be of interest 
here. 

The incorporation of state constraints in RHG is common (29]. We note that it may be natural in some cases to 
incorporate state constraints in the Monte Carlo based approximation algorithm detailed herein. For example, state 
constraints may be enforced by simply restricting the evolution of the sampled trajectories (e.g. via dictating that 
certain regions of the state space hold zero probability). 

Efficient sampling and Monte Carlo simulation (T6||32] that reduces the variance and thus the error in the Monte 
Carlo based controller approximation is of interest. Other computational aspects of this approximation are of inter¬ 
est, particularly as they apply to high-dimensional implementation. 

®The point of this assumption is to impose normality on the distrihution of the error ut — u*. Regardless of the distribution, it is true that the 
variance of the error decreases proportionally with increasing N (at the rate 1 / AO and that the bias decreases continuously with A 2 . In practice, 
with N large enough, any error in applying this assumption is small and can be quantified via bounds of the Berry-Esseen type O. 




Finally, we mention that extensions which account for partial-information feedback may he important, partic¬ 
ularly in the stochastic framework where true state feedback is overly restrictive. In this setting, the coupling of 
stochastic RHC, and particularly the Monte Carlo approximation algorithm, with sequential Monte Carlo estima¬ 
tion/filtering (e.g. particle filtering [5]) would be a natural topic for further study 
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Appendix: Proof of Lemma[I] 


We start with the lower hound. Recall that v{x] = ElipiXj) + Jq £(.Xs, u*)ds] where we use the shorthand Xg for the 
solution of the system {1) driven by the optimal control with initial condition x. We have 

E^Xt)] ^ C2E\Xt\P ^ C2\E[Xt]\P 

using the modelling hypotheses first and Jensen’s inequality second. Moreover, we have 

> C2E\ri\Xg\P + \u;\P]ds 
Ido 

> C22^-PE\r{\Xs\ + \u;\)Pds 

[Jo 

\ r \f{Xg,u;)\Pds 
Ido 


e \ f £(.Xs, u*)ds 
[do 
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Cl [jo 


TPci [Jo 


E f \TflXs,u*g)\Pdi 
[do 


where we have used the modelling hypotheses on £ix, u], together with the fact that |/(x, u) \ ^ Ci(|x| + |u|). Nowit 
follows that 


E[f £{Xg,Ug)ds 
[do 


> 




C 221 -P 


TP-^ci 

C 22 I-P 
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— E f{X^,u*g)ds 
^Cil [do 
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where we used Jensen’s inequality twice. Putting together the bounds on ElcpiXr)] and EIJq £iXs, Us)ds] gives 

C22^~P 

v{x) > C 2 |E[Xr]|P + -^^|x-E[Xr]|P 

rP“^ci 




c(|E[Xr]| + |x-E[Xr]|)'’ 
c(|E[Xr] + x-E[Xrll)^ = c|x|f 


for some constant c > 0, where we used the fact the all norms are equivalent on a finite dimensional vector space 
followed by the triangle inequality. This completes the proof for the lower bound. 

We now turn to the upper bound and recall that, given an arbitrary admissible control law, the cost functional 
w{t, X, u) associated with (D is 


w{0,x, u) := E 


■I 


(f>(XT)+ £{Xg,Us]ds 


We immediately have 


v[x):- inf w[0,x,u) ^ w{(),x,0) 


where ic(0,x,0) denotes the cost found after applying a constant zero control Ut = 0. Going forward, write Xt := 
xO.x.o Jqj. solution of (I) with Xq = Xg'^'^ - x and a constant zero control Ut = 0. Then 

dXt = f(Xt,0)dt+g(Xt,0]dWt 

Note that if Wf = 0 is not admissible we may substitute some other (sub-optimal) constant control signal. Then, for 
all p ^ 1 with t^O, its known that given the existence of solutions to (T) it holds that 

E|XdP<c(l + |x|P)e"' 

for some finite c> 0. This, together with the assumptions on the cost, gives 


ic(0,x,0) = E 


■I 


(f>(XT]+ £(Xs,0]ds 


E[c3(l+|Xr|P)]+E[ f^csd + IXslP + OJds 
[do 

< C3|n-r-t|x|Pe''^-i-^ |x|Pe‘'*ds| ^ C3|n-r-t|x|Pe‘'^-t|x|P^— 

which, after gathering constants, completes the proof concerning the upper-bound. 

Bringing everything together, it follows that there exists a pair of positive constants C 4 ,C 5 , depending only on 
p, T, Cl, C 2 ,C 3 , such that C 4 |x|P ^ v[x) ^ C 5 (l + |x|P), Vx e IR“ and v{x) 00 with |x| ^ 00 . □ 






















